The ‘Import tool’ imports individual tables from RDBMS to HDFS. Each row
in a table is treated as a record in HDFS. All records are stored as
text data in the text files or as binary data in Avro and Sequence
files.Here Import command is used to import a table from mysql database to HDFS.
Import
table data to HDFS (O/P file will be by default delimited text)
sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root --password cloudera --table emp;sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root --password cloudera --table emp -m 2;
Sqoop imports data in parallel from most database sources. -m property
is used to specify the number of mappers to be executed.Sqoop imports
data in parallel from most database sources. You can specify the number
of map tasks (parallel processes) to use to perform the import by using
the -m or –num-mappers argument. Each of these arguments
takes an integer value which corresponds to the degree of parallelism to
employ. You can control the number of mappers independently from the
number of files present in the directory. Export performance depends on
the degree of parallelism. By default, Sqoop will use four tasks in parallel for the export process.
This may not be optimal, you will need to experiment with your own
particular setup. Additional tasks may offer better concurrency, but if
the database is already bottlenecked on updating indices, invoking
triggers, and so on, then additional load may decrease performance.
Import
table data to HDFS (Tab separated file format)
Sqoop be default use 4 task in parallel
sqoop-import --options-file /home/cloudera/Desktop/data/pram.txt --fields-terminated-by '\t' --table emp
sqoop-import --options-file /home/cloudera/Desktop/data/pram.txt --fields-terminated-by '\t' --table emp
We can set the number of mapper --m options.
sqoop-import --options-file /home/cloudera/Desktop/data/pram.txt --fields-terminated-by '\t' --table emp -m 1
sqoop-import
--options-file /home/cloudera/Desktop/data/pram.txt
--fields-terminated-by '\t' --table emp --delete-target-dir -m 1
Import
Data with “SPLIT-BY” OPTION
If there is no primary key in the table, then we need to specify no:mappers as 1 i.e., sequential import of data. we can explicitly specify no:of mappers for parallel import. If there is no primary key in the table & if we need parallel import then, use split-by "some column name" & can specify any numbers of mappers. Those many part-m files will be generated.It is used to specify the column of the table used to generate splits for imports. This means that it specifies which column will be used to create the split while importing the data into your cluster. It can be used to enhance the import performance by achieving greater parallelism. Sqoop creates splits based on values in a particular column of the table which is specified by --split-by by the user through the import command. If it is not available, the primary key of the input table is used to create the splits.
sqoop import --options-file /home/cloudera/Desktop/data/pram.txt --table emp --target-dir=/sqoop/ImportSplit --fields-terminated-by '|' --where 'esal > 22000' --split-by empid -m 1;
sqoop import --options-file /home/cloudera/Desktop/data/pram.txt --table emp --target-dir=/sqoop/ImportSplit --fields-terminated-by '|' --where 'esal > 22000' --split-by empid -m 4;
Import
table data to HDFS (Save To Target Directory)
sqoop
import
--connect jdbc:mysql://localhost:3306/dwdev --username root
--password cloudera --table emp --target-dir=/sqoop/sqoop_data1 -m
1;
sqoop
import --connect jdbc:mysql://localhost:3306/dwdev --username root
--password cloudera --table emp --target-dir=/sqoop/Import1 -m 2;
sqoop
import
--connect jdbc:mysql://localhost:3306/retail_db --username root
--password cloudera --table orders -m 1;
Pass Parameter file to Sqoop
Pass Parameter file to Sqoop
sqoop-list-tables
--options-file
/home/cloudera/Desktop/data/pram.txt;
cat>pram.txt
--connect
jdbc:mysql://localhost:3306/dwdev
--username
root
--password
cloudera
Import Data in Append Mode
cloudera
Import Data in Append Mode
Sqoop is able to append data to an existing data set by adding the argument --append. In append mode, files will be created in a temporary location first. If all data is imported successfully, they will be moved to the destination. In case of any errors, the import is considered as failed. Temporary created files will be destroyed.
sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root
--password cloudera --table emp --target-dir=/sqoop/sqoop_data1 -m 1
--append;
sqoop import --options-file /home/cloudera/Desktop/data/pram.txt --table emp --target-dir=/sqoop/sqoop_data1 -m 1 --append;
To
import all tables data form RDBMS to HDFS
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt -m (default Location)
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt --target-dir=/sqoop/data -m 1;
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt -m (default Location)
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt --target-dir=/sqoop/data -m 1;
Import table data to HDFS(where condition)
sqoop import --options-file /home/cloudera/Desktop/data/pram.txt --table
emp --target-dir=/sqoop/ImportCond --fields-terminated-by '|' --where
'esal > 22000' -m 1;
Import table data to HDFS(Import Only Specific columns)
sqoop import --options-file /home/cloudera/Desktop/data/pram.txt -m 1 --table emp --columns 'ename,esal' --target-dir=data/sqoop_data4 --fields-terminated-by '|' ;
sqoop import --connect jdbc:mysql://localhost:3306/dwdev --username root --password cloudera --table emp --columns empid,esal --target-dir=data/sqoop_data5 --fields-terminated-by '\t' -m 1;
Import table data to HDFS(Import Only Specific tables)
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt -m 1 --exclude-tables emp,student;
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt --exclude-tables emp,student --target-dir=data/sqoop_data6 -m 1;
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt -m 1 --exclude-tables emp,student;
sqoop import-all-tables --options-file /home/cloudera/Desktop/data/pram.txt --exclude-tables emp,student --target-dir=data/sqoop_data6 -m 1;
No comments:
Post a Comment